- Title
- Learning latent byte-level feature representation for malware detection
- Creator
- Yousefi-Azar, Mahmood; Hamey, Len; Varadharajan, Vijay; Chen, Shiping
- Relation
- 25th International Conference on Neural Information Processing (ICONIP 2018). Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13-16, 2018 Proceedings, Part IV [presented in Lecture Notes in Computer Science, Vol. 11304] (Siem Reap, Cambodia 13-16 December, 2018) p. 568-578
- Publisher Link
- http://dx.doi.org/10.1007/978-3-030-04212-7_50
- Publisher
- Springer Nature
- Resource Type
- conference paper
- Date
- 2018
- Description
- This paper proposes two different byte level feature representations of binary files for malware detection. The proposed static feature representations do not need any third-party tools and are independent of the operating system because they operate on the raw file bytes. Sparse term-frequency simhashing (s-tf-simhashing) is a faster type of tf-simhashing. S-tf-simhashing requires less computation and outperforms the original dense tf-simhashing. The binary word2vec (Bword2vec) representation embeds the semantic relationships of the n-grams into the code vectors. Bword2vec employs a binary to word2vec representation that reduces the feature space dimension than s-tf-simhashing and thus further reducing the computation of the classifier. We show that the proposed techniques can successfully be used for both analyzing of full malware apps and infected files. The experiments are conducted on real Android and PDF malware datasets.
- Subject
- malware detection; binary-level feature representation; sparse term-frequency simhashing; binary word2vec
- Identifier
- http://hdl.handle.net/1959.13/1408869
- Identifier
- uon:35896
- Identifier
- ISBN:9783030042110
- Language
- eng
- Reviewed
- Hits: 1545
- Visitors: 1543
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|